.TH LIKWID-PERFCTR 1 <DATE> likwid\-<VERSION>
.SH NAME
likwid-perfCtr \- configure and read out hardware performance counters on x86 cpus
.SH SYNOPSIS
.B likwid-perfCtr 
.RB [\-vhVmai]
.RB [ \-c
.IR core_list ]
.RB [ \-g
.IR performance_group
or
.IR performance_event_string ]
.SH DESCRIPTION
.B likwid-perfCtr
is a lightweight command line application to configure and read out hardware performance monitoring data
on supported x86 processors. It can measure either as wrapper without changing the measured application
or with marker functions inside the code, which will turn on and off the counters. There are preconfigured
groups with useful event sets and derived metrics. Additonally arbitrary events can be measured with
custom event sets. The marker API can measure mulitple named regions and the results are accumulated.
The following x86 processor families are supported:
.IP \[bu] 
.B Intel Core 2:
all variants. Counters:
.I PMC0, PMC1, FIXC0, FIXC1
.IP \[bu] 
.B Intel Nehalem:
all variants (apart from Nehalem EX). Counters:
.I PMC0, PMC1, PMC2, PMC3, UPMC0 - UPMC7, FIXC0, FIXC1
.IP \[bu] 
.B Intel Pentium M:
Banias and Dothan variants. Counters:
.I PMC0, PMC1
.IP \[bu] 
.B Intel P6:
Tested on P3.
.IP \[bu] 
.B AMD K8:
all variants. Counters:
.I PMC0, PMC1, PMC2, PMC3
.IP \[bu] 
.B AMD K10:
Barcelona, Shanghai and Istanbul based processors. Counters:
.I PMC0, PMC1, PMC2, PMC3

.SH OPTIONS
.TP
.B \-\^v
prints version information to standard output, then exits.
.TP
.B \-\^h
prints a help message to standard output, then exits.
.TP
.B \-\^V
verbose output during execution for diagnostic reasons.
.TP
.B \-\^m
run in marker mode
.TP
.B \-\^a
print available performance groups for current processor, then exit.
.TP
.B \-\^i
print cpuid information about processor and on Intel Performance Monitoring features, then exit.
.TP
.B \-\^c " processor_list"
specify a numerical list of processors. The list may contain multiple 
items, separated by comma, and ranges. For example 0,3,9-11.
.TP
.B \-\^g " performance group or performance event set string"
specify which performance group to measure. This can be one of the tags output with the -a flag.
Also a custom event set can be specified by a comma separated list of events. Each event has the format
eventId:register with the the register being one of a architecture supported performance counter registers.

.SH EXAMPLE
Because 
.B likwid-perfCtr
measures on processors and not single applications it is necessary to ensure
that processes and threads are pinned to dedicated resources. This can be achieved by chaining the 
.B likwid-perCtr
and 
.B likwid-pin
applications.
.IP 1. 4
As wrapper with performance group:
.TP
.B likwid-perfCtr  -c 1,2  -g TLB   likwid-pin -c 0,1,2 ./cacheBench -n 2 -l 1048576 -i 100 -t Stream
.PP
The parent process is pinned to processor 0, Thread 0 to processor 1 and Thread 1 to processor 2.
We want to measure the threads only. Therefore only processor 1 and 2 are measured.
.IP 2. 4
As wrapper with custom event set on AMD:
.TP
.B likwid-perfCtr  -c 1,2  -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3   likwid-pin -c 0,1-4 ./cacheBench
.PP
Only processors 1 and 2 are measured. It is specified that the event
.B INSTRUCTIONS_RETIRED_SSE
is measured on counter
.B PMC0
and the event
.B CPU_CLOCKS_UNHALTED
on counter
.B PMC3.
It is possible to let likwid-perfCtr calculate the runtime of all threads based on the
.B CPU_CLOCKS_UNHALTED
event. If you want this you have to include this event in your custom event string as shown above.

.IP 3. 4
As wrapper with custom event set on Intel:
.TP
.B likwid-perfCtr  -c 0  -g INSTR_RETIRED_ANY:FIXC0,CPU_CLK_UNHALTED_CORE:FIXC1,UNC_L3_LINES_IN_ANY:UPMC0   likwid-pin -c 0  ./stream-icc
.PP
On Intel processors fixed events are measured on dedicated counters. These are
.B INSTR_RETIRED_ANY
and
.B CPU_CLK_UNHALTED_CORE.
If you configure these fixed counters, 
.B likwid-perfCtr
will calculate the runtime and CPI metric for your run.

.IP 4. 4
Using the marker API to measure only parts of your code (this can be used both with groups or custom event sets):
.TP
.B likwid-perfCtr -m -c 1-4  -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3   likwid-pin -c 0,1-4 ./cacheBench
.PP
You have to link you code against liblikwid.a and use the marker API calls.
The following code snippet shows the necessary calls:

.nf
#include <likwid.h>

/* only one thread calls init */
     if (threadId == 0)
     {
         likwid_markerInit(numberOfThreads,1);
         likwid_markerRegisterRegion("Benchmark");
     }
     BARRIER;
     /* all threads get the regionId */
     Region = likwid_markerGetRegionId("Benchmark");

     likwid_markerStartRegion(threadId, coreId);
     /* your code to be measured is here */

     /* you must specify the threadId and the coreId the thread is running on */
     likwid_markerStopRegion(threadId, coreId, Region);
     BARRIER;
     /* again only one thread can close the markers */
     if (threadId == 0)
     {
         likwid_markerClose();
     }
.fi

For more code examples have a look at the likwid WIKI pages.

.SH AUTHOR
Written by Jan Treibig <jan.treibig@gmail.com>.
.SH BUGS
Report Bugs on <http://code.google.com/p/likwid/issues/list>.
.SH SEE ALSO
likwid-topology(1), likwid-features(1), likwid-pin(1)
